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Abstract 

A  random  signal  is  observed  in  independent  random  noise.  Ue  are 
addressing  the  problem  of  finding  the  optimum  signal  estimate  that  is 
constrained  to  lie  in  a  given  linear  subspace.  The  optimality  is  defined 
in  the  sense  of  weighted  mean  square  error.  In  the  second  step,  we  find 
the  optimum  linear  subspace  of  given  dimensionality.  It  is  shown  tb  be 
the  linear  manifold  spanned  by  the  eigenvectors  of  the  simultaneous 
diagonalization  of  the  signal  and  noise  covariance,  that  correspond  to  the 
largest  eigenvalues.  The  result  is  valid  for  both  discrete  and  continuous 
time.  For  large  observation  time  and  stationary  signals,  it  is  shown 
that  the  constrained  optimal  estimate  is  determined  by  the  two  spectral 
densities  and  a  weighted  Fourier  Transform  of  the  noise  observations.  The 
above  result  applies  to  both  discrete  time  and  continuous  time  signals. 

The  Wiener  filter  emerges  as  a  special  case  of  the  constrained 
filtering  estimate,  when  the  linear  subspace  is  enlarged  to  coincide  with 
the  total  measurement  space. 


Research  supported  by  the  Air  Force  Office  of  Scientific  Research  by 
Grant  AFOSR  82-0030. 


Key  Words:  Signal  Filtering,  Data  Compression,  and  Data  Representation. 


I.  General  theory  for  discrete  time  signals 


Consider  a  deterministic  signal  xcRn  ■  n- dimensional  Euclidean 
space.  Let  V01  ■  {V.  .  .  .  V  ;  m  <  n)  be  a  collection  of  orthonormal 

X  m  — 

column  vectors  V^eRn.  The  minimum  square  error  norm  representation  of  x 

In  the  llniar  manifold  of  V®  is  defined  as  the  solution  (a,  ...  a  )  that 
_  l  m 

a  2 

minimizes  | |x  -  £  a.  V.  j |  .  It  is  well  known  from  linear  algebra  that  the 

k»l 

best  approximation  x  on  Vs  is: 

-  A  ®  T 

x  ■  Px  ■  (  ][  V.  V  )x;  P  ■  projection  operator  (1) 

k-1  *  * 


If  x  is  a  random  signal  with  zero  mean  and  covariance  matrix 
A  T 

Rx  ■  Exx  ,  then  it  is  easy  to  demonstrate  that:  [1] 


Sm(vn)  &  e| 


X  -  X 


n 

I 

k*nri-l 


T 

V,  R  V, 
k  x  k 


(2) 


where  x  is  the  best  approximation  given  by  equation  (1),  and  S  (V°)  is 

m 

the  minimum  mean  square  error  achievable  for  representation  of  x,  using 
as  basis  the  set  Vn. 

Suppose  now  that  we  wish  to  choose  V11  so  as  to  minimize  S  (Vn) .  The 

m 

minimization  is  achieved  if  V11  is  chosen  as  the  set  of  eigenvectors  of  R^ 
[1],  and  the  resulting  minimum  value  of  S  (Vn)  is:  [1] 


S*(Vn) 

in 


l  \ 

k-mfl 


where  ^  *2  —  *  •  ’  —  are  t*ie  ei8envalues  °*  ^x»  ordered  in  decreasing 


magnitude . 


2 


Consider  now  the  case  of  estimating  x,  on  the  basis  of  noisy  data. 
The  available  observation  is: 


y  -  x  +  *, 


Exx  -  Rx, 


Ex  -  Ez  -  0, 


Ess  *  Rz, 


T 

Exz  -  0 


l*£l  +  o 


(4) 


Let  x  be  an  estimate  of  x.  The  weighted  mean  square  error  S  (V  )  is 

tn 


defined  as: 


S.<v")  -E||x-x||2.l.l||^-i1||J, 


(5) 


-  fV1 

1  z 


x,  *  R_  '  x  , 


_-l/2  « 

Xlm\  X 


1/2 

and  R  is  the  unique  symmetric  positive  definite  square  root  of  R  , 
z  z 

—1/9  —1/9 

which  exists  because  |  R  J  0.  Let  y..  ■  R  y,  z1  ■  R  z.  Then, 

Z  X  Z  X  z 

-1/2 

multiplying  (4)  by  R^  we  achieve  whitening  of  the  noise:  *  xj+z^, 

T 

Ezl2l  “  I' 

We  consider  now  estimates  of  of  the  form: 


m 


X-  -  I  c  V,  V.  y. 
1  .  L,  k  k  k  1 

k"l 


(6) 


From  equation  (5),  the  mean  square  error  is  found  to  be: 

smcv“)  4  Ellxj  -  ijll2  -  |  vJrv  +  J  4  [V2RVk  +  11 

k*nH-l  k-1 


-  2c.  V?RV.  +  vJrv. 
k  k  k  k  k 


(7) 


where  R  -  R"1/2  R  r"1/2. 

Z  X  z 


If  we  minimize  S  (V11)  over  the  choice  of  the  constants  c,  .  .  .  c  , 
m  l  ■ 

we  find  that  the  minimizing  values  are: 


ck  -  (V*  R  V  <vk  R  vk  ♦  I)-1 


(•) 
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with  a  resulting  minimum  of  S  (V11) : 


Sm(vD)  "  E  g(wk)  +  l  wk 

k-1  k-nri-1 


(9) 


where: 


A  vT 


V*  R  V ,  g(z)  -  1  -  (s  +  l)"1  <  z 


CIO) 


k  k  **  ’k 

Our  next  effort  is  to  choose  V11  so  that  S*(Vn)  is  minimized.  From 

in 

(9),  (10)  we  observe  that  {w^}  have  to  be  minimized,  and  in  such  a  manner 
that  the  m  smallest  achievable  values  are  assigned  to  the  second  partial 
sum  in  equation  (9),  while  the  other  n-m  larger  ones  are  assigned  to  the 


*  •  *»  wmfl*  wm . V 


first  sum.  So,  we  first  minimize  w  ,  then  w  ,, 

n  n*i 

At  each  step  of  minimizing  Wj  over  Vj  such  that  j |v^ [ |  ■  1,  the  previous 

vectors  V..,  V, ,  ...»  V  have  been  found  and  are  fixed.  Hence  V.  has 
j+1  j+2’  n  j 

to  be  orthogonal  to  the  previously  found  vectors  {V...  V..0  .  .  .  V  }. 

J+-1  jtz  n 

From  matrix  theory  [2 I,  we  observe  that  the  above  procedure  for 


finding  the  optimum  set  {V^ }  produces  the  eigenvectors  {Q^}  of  R.  Let 
>  X2  il  •  •  •  2L  be  the  corresponding  eigenvalues.  Let  us  denote  by 
S**  the  minimum  over  {Vn}  value  of  S*(Vn).  Then, 


m 


S**  -  f  [i  -  (i  +  x  r1]  +  i  x 

k-l  *  k-nrfl  * 


(ID 


The  corresponding  optimal  c^'s  are: 


■=»  •  V(1  +  V 


-1 


(12) 


and  the  optimal  estimate  x  is: 


x  -  R 


\n  il '  Ji >k<1  +  >k>'1  Qk  ,k  R 


-1  —1/2  „  „T  .-1/2 


(13) 
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It  is  well  known  that  the  eigenvalues  of  R  satisfy  the  equation 
jwR^  -  Rx|  -  0,  and  are  the  eigenvectors  of  R^  with  respect  to  R^.  [1] 
For  the  case  m**n,  the  optimum  estimator  of  x  is  Identical  to  the  Wiener 
filter,  as  can  be  easily  verified.  Thus,  the  new  optimal  representation 
or  estimation  of  the  noisy  signal  x  can  be  viewed  as  a  constrained  linear 
estimate  of  prespecified  dimensionality. 

The  main  computational  difficulty  of  the  representation  is  the 
evaluation  of  the  eigenvalues  and  eigenvectors  {A^*  Qfc;  k*l,  .  .  . ,  n} 
of  the  matrix  with  respect  to  the  matrix  Rq.  There  are  some  special 
cases  in  which  the  eigenvectors  and  eigenvalues  can  be  evaluated  in 
closed  form,  described  next. 


-  5  - 


l.  Suppose  that  x  is  produced  by  unifor*  sampling  every  h  sec.  of 
a  periodic  process  with  period  nh.  Then  R^  is  a  circulant:  (qg  “  qg) 


q0  ql  q2  '  *  *  Vl 

Vl  q0  ql  *  *  '  qn-2 


R 


(14) 


i.e.,  each  row  is  a  cyclic  permutation  of  the  first  one.  Let,  also  x  be 
white  noise  with  R  -  O2  I  .  The  matrix  R  -  R  R_  R-"3*  *  o  2  R  has  real 

2  D  s  *  «  * 

eigenvalues  [3] ,  [4] : 

X,  *  a~2  l  q  ,  exp  [-2*j  (s-l)  (k-l)n"1);  k  •  1,  .  .  n.  (15) 
s-l 

The  corresponding  eigenvectors  are: 

Qk  -  {n"**  exp  t-2wj  (k-1)  (s-l)n"1] ;  s-l,  .  •  nh  (16) 

Hence,  it  is  an  easy  matter  to  pick  the  m  largest  eigenvalues  from  (15), 
the  corresponding  eigenvectors  from  (16),  and  then  formulate  the  optimum 
estimate  according  to  equation  (13) . 


Example  2.  Tridiagonal  correlation. 
Let: 


R 


K 


U*> 


The  eigenvalues  o£  (17")  are  [5] : 

Xfc  -  c"2(l  -  2 J q j cos [kx (n  +  l)-1]);  k  -  1,  ...»  n  (18) 

The  corresponding  eigenvectors  are  [5] : 

*  {[2/(n  +  1)]**  8in[ksir(n  +  1)  2] ;  s  *  1,  2,  ....  n)  (19) 

Ue  will  consider  next  the  case  of  stationary  observations  and  large  n. 

Then,  the  matrices  R  ,  R  are  Toeplitz.  Let  8  (X),  s  (X);  Xe[0,  2x]  be  the 

X  z  x  z 

spectral  densities  of  the  processes  x,  z,  assumed  to  be  strictly  positive  for 
all  Xe[0,  2tt  ] .  We  also  assume  that  the  autocorrelation  sequences  for  x,  z  are 
square  summable.  Then,  according  to  the  theory  developed  in  [4] ,  we  can 
determine  the  asymptotic  distribution  of  the  eigenvalues  of  the  matrix  R  and 
the  corresponding  eigenvectors.  The  eigenvalues  are  for  k  “  1,  .  .  .,  n: 

X.  ~  s  (27rkn”1)  s  'l(2jrkn"1>  (2Q) 

K  *  X  Z 

Qk  -  (n  ^  exp  [-  2tt  j  (k-1)  (s-l)n  ^];  s  ■  1,  .  .  . ,  n)  (21) 

The  matrix  R  has  the  following  approximate  expression  for  large  n: 
z 

Rz  -I  sz(2vkn"1)  Qk  <)£  (22) 

k-1 

For  the  optimal  representation  of  x  by  an  m  dimensional  subspace,  the  m 
indices  corresponding  to  the  m  larges  eigenvalues  Xk  according  to  equation  (20) 
will  provide  the  solution.  The  corresponding  eigenvector  Qk  given  by  eq.  (21) 
will  determine  the  estimate  ft,  according  to  eq.  (13).  From  equations  (11), 

(20)  we  can  derive  the  resulting  mean  square  error: 

S**  -  l  [1  -  [sx(2*kn"1)dz'’1(2irkn”1)  +  l]*1)  ♦ 

♦  \  s  Uxkn'1)*  “1(2skn"1) 

« - ,  a  x  z 


(23) 
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We  now  let  m  be  a  fixed  fraction  of  n,  m  *  np,  0  <  p  <  1.  The  question  we 
consider  now  is:  If  a  fraction  p  of  the  signal  coordinates  are  allowed,  what 
is  the  best  achievable  mean  square  error  when  n -*■<»?  We  are  specifically 
interested  in  the  asymptotic  per  sample  mean  square  error,  defined  as: 

S**(p)  A  lim  n  ^S*  (under  the  condition  m  =  np)  (24) 

"  n~°  np 


From  (23),  (24)  we  find: 


S**(p) 


(2tt)-1J  [1  -  [s  (X)s”1  (X)  +  l]-X]dX  + 

I  X  2 

P 


,-l, 


+  (2TT)_1  f  sx(X)  s"1  (X)  d\ 


-1 


(25) 


where  I  is  a  subset  of  the  interval  [0,  2^],  of  Lebesgue  measure  2irp,  and  such 
that  the  largest  values  of  sx(^)s2^  (^)  are  concentrated  in  1^.  In  other 
words,  for  any  X.el  ,  XXl  ,  we  will  have  s  (X1)s  1  (X.. )  >_  s  (X7)s  1  (X7). 

Our  conclusions  are  useful  for  suboptimal  filtering  and  compression  of 
noisy  data.  The  compression  ratio  when  the  subspace  of  dimensionality  m  *  np 
is  used,  is  equal  to  p 

From  the  theory  of  asymptotic  approximation  of  Toeplitz  matrices  by 
circulant  ones,  (13],  [4])  we  find  that  using  (22),  for  large  n,  we  have: 


^  s  l  (2irkn_1)  Qk  qj;  if*5  S  f  (2ffkn_1)  Qk  Q* 


k-1 


k=l 


(26) 


where  are  the  Finite  Fourier  Transform  vectors,  defined  by  eq.  (21). 

Substituting  the  approximations  (26) ,  (20),  (21)  into  (13),  and  using 


the  orthogonality  of  after  some  algebra  we  find  the  following  expression 

for  x: 


(27) 


k 


Ji  Qk  Ck  yk 


where: 

c.  A  [1  -  [s  (2iTkn_1)s"1(2TTkn'1)  +  l]"1]  (28) 

k  *  x  z 

and 


are  the  Discrete  Fourier  Transform  (DFT)  coefficients  of  the  vector  y.  They 
can  be  evaluated  by  Fast  Fourier  Transform  methods. 

Equation  (27)  provides  the  following  conclusion.  If  one  wishes  to 

represent  x  using  m  of  the  DFT  coefficients,  and  a  noisy  version  of  x  is 

only  available,  the  mean  square  optimal  representation  will  be  achieved  through 

(27)  -  (29).  assuming  that  s  ,  s  are  known  a-priori.  The  weighting  coefficients 

X  z 

cfc  essentially  act  as  optimal  filtering  coefficients  in  the  orthogonal 
directions  of  the  representation  subspace.  The  directions  are  chosen  so 
that  the  signal  to  noise  ratio  sx(2irkn  ^)  *  s^  (2irkn  ^)  is  maximal. 

It  should  be  stressed  that  the  estimate  x  is  given  by  the  approximate 
expression  (27).  The  question  of  how  well  the  approximate  estimate  performs 
as  compared  to  the  exact  one,  is  of  interest  but  not  pursued  here.  For  the 
unconstrained  filtering  problem,  Pearl  has  derived  interesting  bounds  for 
the  variance  of  the  approximate  filtering  estimate  based  on  the  Discrete 
Fourier  Transform  (6j.  We  believe  that  his  work  on  asymptotic  equivalence 
of  spectral  representations  [7]  is  applicable  in  deriving  bounds  for  the  mean 
square  error  performance  of  the  approximate  constrained  estimate  (27). 


II.  Continuous  time  signals 


We  will  consider  now  extension  of  the  previous  theory  to  continuous 
time  random  signals  observed  in  the  presence  of  noise.  Let 

y(t)  *  x(t)  +  z(t);  0  <  t  <  T 


where  x(t),  z(t)  are  zero  mean  processes  of  finite  mean  square  value,  with 

correlation  functions  R  (t,  s),  R  (t,  s)  correspondingly. 

X  z 

Instead  of  developing  a  new  theory  parallel  to  the  development  of  Section 
1,  we  will  proceed  stating  the  continuous  time  analogous  results.  All  of  the 
continuous  time  development  can  be  made  rigorous  by  using  Kadota's 
simultaneous  orthogonal  expansion  of  two  operators  [8],  [9]. 

Consider  the  operator: 


R  =  R(t ,  s) 


R  R 
x  z 


-is 


(30) 


— 

where  R  is  the  inverse  of  the  square  root  of  R  (t,  s),  appropriately  defined, 
z  z 

The  assumption  that  R  is  a  bounded  and  densely  defined  operator  has  to  be 

made  [9).  Then,  R  has  a  set  of  orthonormal  eigenfunctions  {^(t);  k  *  1,  2,  ...} 

and  corresponding  eigenvalues  X2  >_  .  .  satisfying  the  integral  equation 


/  R(t,  s)4>k(s)ds  =  Wt) 


(31) 


Let 


z.  (t)  -  |  R  (t,  s)<j>.  (s)ds  (32) 

K  0 

Then  (z  (t)}  are  the  eigenfunctions  of  R  with  respect  to  R  with  corresponding 
(C  x  z 

eigenvalues  {X^},  In  analogy  to  the  time  discrete  case.  The  mean  square  error 
of  estimating  x(t)  using  m  coordinates  only,  will  be  minimized  if 
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the  eigenfunctions  corresponding  to  the  m  largest  eigenvalues  are  used.  The 
eigenfunctions  zk(t)  satisfy  the  Integral  equation: 

T  T 

/  R  (t,  s)z.  (s)ds  -  X  /  R  (t,  s)z,  (s)ds  (33) 

0  0 

and  the  orthogonality  condition  with  respect  to  R  : 

z 

T  T 

/  /  zfc(t)Rz(t,  s)z*(s)dtds  -  (34) 

The  estimate  x(t),  after  some  manipulations,  is  found  to  have  the  expression: 

m  .  T  T 

x(t)  -  l  X  (1  +  X.  )  /  R  (t,  s)z.  (s)ds  •  /  z.  (u)y(u)du  (35) 

k*l  0  0 

The  resulting  mean  square  error  is: 


S** 

m 


i  [i  -  ex.  + 1)  i  +  i 

k-1  k«m+l 


(36) 


With  the  exception  of  some  special  cases  mentioned  in  [8] ,  the  evaluation  of 
eigenvalues  and  eigenfunctions  cannot  be  achieved  in  closed  form.  Numerical 
solutions  are  required,  in  general.  Hence,  equations  (33)- (36)  are  of 
theoretical  interest  but  limited  practical  usefulness. 

There  is  a  special  case  of  practical  importance,  in  which  simplifications 
are  possible.  Suppose  that  x(t),  z(t)  are  wide  sense  stationary  processes 
of  finite  average  power,  and  are  observed  on  [-T/2,  T/2] .  Suppose  also  that 
the  corresponding  spectral  densities  s  (f),  s  (f)  are  positive  on  [-W,  W]  and 
zero  otherwise. 

For  large  T,  Van  Trees  develops  in  [10]  approximations  to  the  eigenvalues 

and  eigenvectors  of  an  integral  equation  of  the  type  (33),  but  for  the  special 

case  R  (t,  s)  ■  6 ( t  -  s).  Straightforward  extension  of  the  approach  of  Van 
z 

Trees  yields  the  approximation: 
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zfc(t)  s  T_ls  exp(j2irkT_1t) ;  |t|<T/2 

=  8jc(kT~1)  •  s'1  (kT_1);  -WT  £  k  <_  WT 

for  large  T.  Note  that  the  approximate  numbers  of  eigenvalues  or  degrees  of 
freedom  of  the  signals  Involved  are  2WT,  equal  to  the  number  mandated  by  the 
sampling  theorem  for  bandllmited  random  processes. 

Suppose  now  that  a  fraction  p  of  the  2WT  degrees  of  freedom  Is  used.  The 
resulting  approximation  to  the  mean  square  error  Is: 

2TW 

]  +  I  \ 

k-2WTp+l 

where  the  ordering  of  eigenvalues  Is: 


2TWp 

S**(T)  =  l  [1  -  (1  +  X.  ) 
p  k-1  K 


X1  -  X2  -  ‘  *  *  -  X2WTp  -  ‘  *  *  -  X2TW 


We  can  evaluate  now  the  asymptotic  representation  error  per  unit  time  as 

S**  A  lira  T_1  S**  (T)  -  f  s  (f)  [s  (f)  +  s  (f)]_1  df  + 

P  “  Txx.  P  i  x  x  z 


+  X  sx(f)s“1  (f)df. 

XP 

where  I^ct-W,  W]  is  a  subset  of  Lebesque  measure  2Wp,  such  that  the  ratio 

s  (f)s  ^(f)  is  maximal,  i.e.  for  all  f,  e  I  ,  f,  i  I  we  have 
x  z  x  p  4  p 

<f2>- 


J 


Conclusions 


We  have  developed  some  intuitively  sensible  conclusions  on  the  constrained 
representation  of  noisy  signals,  when  the  signal  and  noise  covariances  are 
known.  We  have  shown  that  the  eigenvalue-eigenvector  expansion  provides  the 
minimum  mean  square  error  subspace  of  fixed  dimensionality.  The  use  of  eigenvectors 
and  associated  subspaces  has  been  proven  useful  in  various  signal  and  image 
processing  applications,  assuming  no  noise  ([10]  -  [13]).  Problems  in  which 
the  developed  techniques  are  applicable  are  the  simultaneous  compression  and 
filtering  of  noisy  speech  and  images,  and  the  processing  of  sensor  array  data. 

(see,  for  example,  [14]). 


! 
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